Multi-phase Word Sense Embedding Learning Using a Corpus and a Lexical Ontology
نویسندگان
چکیده
Word embeddings play a significant role in many modern NLP systems. However, most prevalent word embedding learning methods learn one representation per word which is problematic for polysemous words and homonymous words. To address this problem, we propose a multi-phase word sense embedding learning method which utilizes both a corpus and a lexical ontology to learn one embedding per word sense. We use word sense definitions and relations between word senses defined in a lexical ontology in a different way from existing systems. Experimental results on word similarity task show that our approach produces word sense embeddings of high quality.
منابع مشابه
Enriching Ontology Concepts Based on Texts from WWW and Corpus
In spite of the growing of ontological engineering tools, ontology knowledge acquisition remains a highly manual, time-consuming and complex task. Automatic ontology learning is a well-established research field whose goal is to support the semi-automatic construction of ontologies starting from available digital resources (e.g., A corpus, web pages, dictionaries, semi-structured and structured...
متن کاملDeveloping a Corpus-Based Word List in Pharmacy Research Articles: A Focus on Academic Culture
The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...
متن کاملcontext2vec: Learning Generic Context Embedding with Bidirectional LSTM
Context representations are central to various NLP tasks, such as word sense disambiguation, named entity recognition, coreference resolution, and many more. In this work we present a neural model for efficiently learning a generic context embedding function from large corpora, using bidirectional LSTM. With a very simple application of our context representations, we manage to surpass or nearl...
متن کاملLexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملEnriching a lexical semantic net with selectional preferences by means of statistical corpus analysis
Broad-coverage ontologies which represent lexical semantic knowledge are being built for more and more natural languages. Such resources provide very useful information for word sense disambiguation, which is crucial for a variety of NLP tasks (e.g. semantic annotation of corpora, information retrieval, or semantic inferencing). Since the manual encoding of such ontologies is very labour-intens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1606.04835 شماره
صفحات -
تاریخ انتشار 2016